The largest and most advanced synthetic dataset ever created for AI training, with 41 billion tokens, aims to level the playing field, enabling open, community-driven intelligence to thrive outside Big Tech’s walls
24 October 2025 – Tether Data’s AI research division, QVAC, has released the largest synthetic dataset ever created for artificial intelligence training under a new initiative called QVAC Genesis. This first release, Genesis I, a massive collection of 41 billion text tokens, is designed to help the world build smarter, more capable, and highly precise STEM-focused language models. Each “text token” represents a tiny fragment of language, the building blocks that AI models use to understand and generate text. By training on 41 billion of these tokens from QVAC Genesis’s dataset, models grasp not just words, but the relationships and logic that connect them.
This dataset has been rigorously validated across educational and scientific benchmarks, demonstrating superior reasoning and problem-solving performance in subjects such as mathematics, physics, biology, and medicine. It represents the first publicly available synthetic dataset, specifically built and rigorously validated for education-specific content, offering comprehensive coverage across key STEM domains where today’s public training datasets fall short.
More than a technical milestone, this release is a statement about who should own the future of intelligence. As AI becomes increasingly centralized, trained, hosted, and controlled by a handful of corporations, QVAC Genesis I is working to return that power to the people by providing open, high-quality data for scientific research advancement.
Tether Data also today released its first consumer app, QVAC Workbench, a comprehensive workspace that demonstrates the potential of local on-device Artificial Intelligence. QVAC Workbench is currently targeting AI enthusiasts, advanced users, and researchers. It already supports a wide variety of LLMs and other AI Models, including Llama, Medgemma, Qwen, SmolVLM, Whisper, and many more.
The app is available for smartphones (Android for now, and iOS within a few days) as well as desktop platforms (Windows, macOS, and Linux), providing the most comprehensive on-device support compared to existing offerings.
With QVAC Workbench, all chats and interactions with the AI Models remain local on-device, where data is owned by the user and remains 100% private. Yet it also offers a unique feature called “Delegated Inference,” which allows a user to connect peer-to-peer to their mobile Workbench app with the Workbench desktop app to fully utilize the power and resources of their home or office workstations.
“Intelligence shouldn’t be centralized,” said Paolo Ardoino, CEO of Tether. “With QVAC Workbench and Genesis I, we’re opening the door to infinite intelligence, AI that lives, learns, and evolves locally on your own device. We believe that intelligence, like information, should be free, accessible, and owned by everyone, not locked behind corporate firewalls or sold as a service. Whether it’s a phone, a robot, or a wearable, intelligence should belong to the individual, not the institution. QVAC Genesis I represents a future where people, not platforms, control how knowledge is created, shared, and used. It’s about restoring balance, bringing intelligence back to the edge, where it belongs, and ensuring the freedom to build and learn is universal.”
By making the QVAC Genesis dataset public, we invite researchers to build and use models that can compete with, and even surpass, proprietary systems. In fact, our dataset was created using a multi-stage generation and validation process that turns high-quality scientific and educational materials into structured learning data. The result is a training resource that helps models reason, solve problems, and think critically, rather than merely imitate language.
“Most AI today sounds smart, but doesn’t truly think,” continued Paolo Ardoino, CEO of Tether. “We designed this dataset to help models understand cause and effect, to make connections, draw conclusions, and reason their way through complexity. And we’re making it open to everyone.”
The release of the first two QVAC projects is part of a broader mission to reshape how AI exists in the world, introducing a new paradigm of ‘local intelligence,’ where tools learn and evolve directly on any device.
The full technical breakdown of the dataset, code-named QVAC Genesis I, is available now in the accompanying research blog: QVAC Genesis I: the Largest and Highest-Quality Multi-domain Educational Synthetic Dataset for Pre-training
QVAC Workbench apps can be downloaded from the qvac.tether.dev website.
About Tether Data
Tether Data, S.A. de C.V. (“Tether Data”) is part of Tether’s broader vision to advance freedom, transparency, and innovation through technology. Its mission is to enable people and organizations to connect and share information directly, without unnecessary intermediaries. By creating secure, peer-to-peer systems, Tether Data gives users greater control over their data, communications, and digital interactions. Tether Data aims to redefine how information flows across networks by replacing centralized models with decentralized infrastructure designed for privacy, efficiency, and resilience. The company’s goal is to make global connectivity faster, safer, and more private, empowering individuals and institutions alike to exchange information freely and securely.
About QVAC
QVAC is Tether Data’s advanced AI research initiative dedicated to building open, decentralized, and adaptive intelligence systems. Its mission is Local AI. Infinite Intelligence. No Compromise envisions a world where AI lives and learns on any device, empowering individuals and communities rather than concentrating power in corporate data centers.